这是对纸张“渐近稳定的适应性最优控制算法的简短评论,VAMVoudakis等人的”具有饱和致动器的渐近稳定的自适应 - 最优控制算法“。强化学习(RL)代理人的稳定性问题仍然很难,并且上述工作建议使用来自自适应控制的技术的合适稳定性属性 - 一个旨在添加到行动的强制性术语。但是,这种方法存在稳定RL的方法,我们将在本说明中解释。此外,Vamvoudakis等人。在通用政策下似乎在汉密尔顿时期已经造成了荒谬的假设。为了提供积极的结果,我们不仅会表明这个错误,而且表明了评论在随机连续环境下的批评神经网络权重聚,为行为政策持有提供了某些条件。
translated by 谷歌翻译
在本文中,我们为人形机器人领域提供了一个机器人模型和代码基础,以获得经济实惠的教育。我们概述了一个机器人的软件和硬件,赢得了2019 - 2019 - 2019年的罗布斯特队的几次比赛,提供了对机器人的当代教育市场的分析,并突出了某些设计解决方案之外的推理。
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
Biological systems often choose actions without an explicit reward signal, a phenomenon known as intrinsic motivation. The computational principles underlying this behavior remain poorly understood. In this study, we investigate an information-theoretic approach to intrinsic motivation, based on maximizing an agent's empowerment (the mutual information between its past actions and future states). We show that this approach generalizes previous attempts to formalize intrinsic motivation, and we provide a computationally efficient algorithm for computing the necessary quantities. We test our approach on several benchmark control problems, and we explain its success in guiding intrinsically motivated behaviors by relating our information-theoretic control function to fundamental properties of the dynamical system representing the combined agent-environment system. This opens the door for designing practical artificial, intrinsically motivated controllers and for linking animal behaviors to their dynamical properties.
translated by 谷歌翻译
Modern mobile burst photography pipelines capture and merge a short sequence of frames to recover an enhanced image, but often disregard the 3D nature of the scene they capture, treating pixel motion between images as a 2D aggregation problem. We show that in a "long-burst", forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth. To this end, we devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion. Our plane plus depth model is trained end-to-end, and performs coarse-to-fine refinement by controlling which multi-resolution volume features the network has access to at what time during training. We validate the method experimentally, and demonstrate geometrically accurate depth reconstructions with no additional hardware or separate data pre-processing and pose-estimation steps.
translated by 谷歌翻译
Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data could be significantly more scalable and enable broader generalization. In this paper, we present ReViND, the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We evaluate our system for off-road navigation without any additional data collection or fine-tuning, and show that it can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.
translated by 谷歌翻译
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data and proved offline reinforcement learning to be an essential toolkit in learning control policies in a model-free setting. An offline reinforcement learning algorithm applied to a dataset collected by a suboptimal non-learning-based algorithm can result in a policy that outperforms the behavior agent used to collect the data. Such a scenario is frequent in robotics, where existing automation is collecting operational data. Although offline learning techniques can learn from data generated by a sub-optimal behavior agent, there is still an opportunity to improve the sample complexity of existing offline reinforcement learning algorithms by strategically introducing human demonstration data into the training process. To this end, we propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data and guide policy training towards optimal behavior while reducing overall sample complexity. Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent. We augmented an existing offline reinforcement learning algorithm Conservative Q-Learning with our approach and performed experiments on data collected from MuJoCo and OffWorld Gym learning environments.
translated by 谷歌翻译
Monitoring changes inside a reservoir in real time is crucial for the success of CO2 injection and long-term storage. Machine learning (ML) is well-suited for real-time CO2 monitoring because of its computational efficiency. However, most existing applications of ML yield only one prediction (i.e., the expectation) for a given input, which may not properly reflect the distribution of the testing data, if it has a shift with respect to that of the training data. The Simultaneous Quantile Regression (SQR) method can estimate the entire conditional distribution of the target variable of a neural network via pinball loss. Here, we incorporate this technique into seismic inversion for purposes of CO2 monitoring. The uncertainty map is then calculated pixel by pixel from a particular prediction interval around the median. We also propose a novel data-augmentation method by sampling the uncertainty to further improve prediction accuracy. The developed methodology is tested on synthetic Kimberlina data, which are created by the Department of Energy and based on a CO2 capture and sequestration (CCS) project in California. The results prove that the proposed network can estimate the subsurface velocity rapidly and with sufficient resolution. Furthermore, the computed uncertainty quantifies the prediction accuracy. The method remains robust even if the testing data are distorted due to problems in the field data acquisition. Another test demonstrates the effectiveness of the developed data-augmentation method in increasing the spatial resolution of the estimated velocity field and in reducing the prediction error.
translated by 谷歌翻译
Stacking (or stacked generalization) is an ensemble learning method with one main distinctiveness from the rest: even though several base models are trained on the original data set, their predictions are further used as input data for one or more metamodels arranged in at least one extra layer. Composing a stack of models can produce high-performance outcomes, but it usually involves a trial-and-error process. Therefore, our previously developed visual analytics system, StackGenVis, was mainly designed to assist users in choosing a set of top-performing and diverse models by measuring their predictive performance. However, it only employs a single logistic regression metamodel. In this paper, we investigate the impact of alternative metamodels on the performance of stacking ensembles using a novel visualization tool, called MetaStackVis. Our interactive tool helps users to visually explore different singular and pairs of metamodels according to their predictive probabilities and multiple validation metrics, as well as their ability to predict specific problematic data instances. MetaStackVis was evaluated with a usage scenario based on a medical data set and via expert interviews.
translated by 谷歌翻译
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zero-shot transfer setting without the need for any fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.
translated by 谷歌翻译